Variational Program Inference
نویسندگان
چکیده
We introduce a framework for representing a variety of interesting problems as inference over the execution of probabilistic model programs. We represent a “solution” to such a problem as a guide program which runs alongside the model program and influences the model program's random choices, leading the model program to sample from a different distribution than from its priors. Ideally the guide program influences the model program to sample from the posteriors given the evidence. We show how the KLdivergence between the true posterior distribution and the distribution induced by the guided model program can be efficiently estimated (up to an additive constant) by sampling multiple executions of the guided model program. In addition, we show how to use the guide program as a proposal distribution in importance sampling to statistically prove lower bounds on the probability of the evidence and on the probability of a hypothesis and the evidence. We can use the quotient of these two bounds as an estimate of the conditional probability of the hypothesis given the evidence. We thus turn the inference problem into a heuristic search for better guide programs. Problem Specification Given partial observations of a complicated system governed by known or unknown probabilistic rules, we would like to automatically reason about the likely state of hidden parts of the system. Systems with Known Rules We model our system as a program in a general purpose programming language. We call this the model program. We will be agnostic to what programming language we are using. We will only insist that it be deterministic except for a choose function which takes a probability distribution as an argument and returns a random choice from it. The model program thus defines a probability distribution P(x) over execution paths x. If over the course of an execution path x of the model program, the choose function is called n times with distributions (P1, P2 ... Pn) and the randomly chosen values are (c1, c2 .. cn) respectively, then the probability of that execution path is P(x) = ∏i Pi(ci). We are interested in the conditional expected value E(h(x)|e), where h is some function of the execution path, and e is some evidence such that we can easily compute P(e|x) for any x. Our programming language needs to include constructs for specifying P(e|x) and h(x). The model program reports P(e|x) as the product of all calls to a function: evidence. This is a particularly convenient in that the evidence function can be passed boolean values which are interpreted as 0 or 1. For example, we could represent an observation that the grass is wet with the call evidence(grass_wet). If grass_wet is false, P(e|x) is multiplied by 0, and if grass_wet is false, P(e|x) is left unchanged. The value of the hypothesis h(x) is defined as the final value of a global variable *h*. We may not have a hypothesis, and may only be interested in sampling runs of the program given the evidence. In this case, *h* might not be set or used. Example 1: Three fair dice are rolled and it is observed that their sum is 7. What is the probability that the first die rolled was a 5? The following model program (shown in pseudocode) might encode that problem. die1 := choose(uniform(1..6)) die2 := choose(uniform(1..6)) die3 := choose(uniform(1..6))
منابع مشابه
Natural Gradients via the Variational Predictive Distribution
Variational inference transforms posterior inference into parametric optimization thereby enabling the use of latent variable models where it would otherwise be impractical. However, variational inference can be finicky when different variational parameters control variables that are strongly correlated under the model. Traditional natural gradients that use the variational approximation fail t...
متن کاملOperator Variational Inference
Variational inference is an umbrella term for algorithms which cast Bayesian inference as optimization. Classically, variational inference uses the Kullback-Leibler divergence to define the optimization. Though this divergence has been widely used, the resultant posterior approximation can suffer from undesirable statistical properties. To address this, we reexamine variational inference from i...
متن کاملVariational inference in graphical models: The view from the marginal polytope
Underlying a variety of techniques for approximate inference—among them mean field, sum-product, and cluster variational methods—is a classical variational principle from statistical physics, which involves a “free energy” optimization problem over the set of all distributions. Working within the framework of exponential families, we describe an alternative view, in which the optimization takes...
متن کاملA Deterministic Global Optimization Method for Variational Inference
Variational inference methods for latent variable statistical models have gained popularity because they are relatively fast, can handle large data sets, and have deterministic convergence guarantees. However, in practice it is unclear whether the fixed point identified by the variational inference algorithm is a local or a global optimum. Here, we propose a method for constructing iterative op...
متن کاملCopula variational inference
We develop a general variational inference method that preserves dependency among the latent variables. Our method uses copulas to augment the families of distributions used inmean-field and structured approximations. Copulas model the dependency that is not captured by the original variational distribution, and thus the augmented variational family guarantees better approximations to the poste...
متن کاملDeterministic Annealing for Stochastic Variational Inference
Stochastic variational inference (SVI) maps posterior inference in latent variable models to nonconvex stochastic optimization. While they enable approximate posterior inference for many otherwise intractable models, variational inference methods suffer from local optima. We introduce deterministic annealing for SVI to overcome this issue. We introduce a temperature parameter that deterministic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1006.0991 شماره
صفحات -
تاریخ انتشار 2008